Piecewise convexity of artificial neural networks
نویسندگان
چکیده
Although artificial neural networks have shown great promise in applications including computer vision and speech recognition, there remains considerable practical and theoretical difficulty in optimizing their parameters. The seemingly unreasonable success of gradient descent methods in minimizing these non-convex functions remains poorly understood. In this work we offer some theoretical guarantees for networks with piecewise affine activation functions, which have in recent years become the norm. We prove three main results. First, that the network is piecewise convex as a function of the input data. Second, that the network, considered as a function of the parameters in a single layer, all others held constant, is again piecewise convex. Third, that the network as a function of all its parameters is piecewise multi-convex, a generalization of biconvexity. From here we characterize the local minima and stationary points of the training objective, showing that they minimize the objective on certain subsets of the parameter space. We then analyze the performance of two optimization algorithms on multi-convex problems: gradient descent, and a method which repeatedly solves a number of convex sub-problems. We prove necessary convergence conditions for the first algorithm and both necessary and sufficient conditions for the second, after introducing regularization to the objective. Finally, we remark on the remaining difficulty of the global optimization problem. Under the squared error objective, we show that by varying the training data, a single rectifier neuron admits local minima arbitrarily far apart, both in objective value and parameter space.
منابع مشابه
Prediction the Return Fluctuations with Artificial Neural Networks' Approach
Time changes of return, inefficiency studies performed and presence of effective factors on share return rate are caused development modern and intelligent methods in estimation and evaluation of share return in stock companies. Aim of this research is prediction of return using financial variables with artificial neural network approach. Therefore, the statistical population of this study incl...
متن کاملHYBRID ARTIFICIAL NEURAL NETWORKS BASED ON ACO-RPROP FOR GENERATING MULTIPLE SPECTRUM-COMPATIBLE ARTIFICIAL EARTHQUAKE RECORDS FOR SPECIFIED SITE GEOLOGY
The main objective of this paper is to use ant optimized neural networks to generate artificial earthquake records. In this regard, training accelerograms selected according to the site geology of recorder station and Wavelet Packet Transform (WPT) used to decompose these records. Then Artificial Neural Networks (ANN) optimized with Ant Colony Optimization and resilient Backpropagation algorith...
متن کاملPrediction of breeding values for the milk production trait in Iranian Holstein cows applying artificial neural networks
The artificial neural networks, the learning algorithms and mathematical models mimicking the information processing ability of human brain can be used non-linear and complex data. The aim of this study was to predict the breeding values for milk production trait in Iranian Holstein cows applying artificial neural networks. Data on 35167 Iranian Holstein cows recorded between 1998 to 2009 were ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neural networks : the official journal of the International Neural Network Society
دوره 94 شماره
صفحات -
تاریخ انتشار 2017